Biostatistics For Dummies (Monika Wahi John Pezzullo)

influencing the outcome more, because they are fighting over explaining the variability in the

dependent variable. Although models with collinearity are valid, they are hard to interpret if you are

looking for cause-and-effect relationships, meaning you are doing causal inference. Chapter 20

provides philosophical guidance on dealing with collinearity in modeling.

Calculating How Many Participants You Need

Studies should target enrolling a large enough sample size to ensure that you get a statistically

significant result for your primary research hypothesis in the case that the effect you’re testing in

that hypothesis is large enough to be of clinical importance. So if the main hypothesis of your

study is going to be tested by a multiple regression analysis, you should theoretically do a

calculation to determine the sample size you need to support that analysis.

Unfortunately, that is not possible in practice, because the equations would be too complicated.

Instead, considerations are aimed more toward being able to gather enough data to support a planned

regression model. Imagine that you plan to gather data about a categorical variable where you believe

only 5 percent of the participants will fall in a particular level. If you are concerned about including

that level in your regression analysis, you would want to greatly increase your estimate for target

sample size. Although regression models tend to converge in software if they include at least 100

rows, that may not be true depending upon the number and distribution of the values in the predictor

variables and the outcome. It is best to use experience from similar studies to help you develop a

target sample size and analytic plan for a multiple regression analysis.